Duration modeling using cumulative duration probability and speaking rate compensation
نویسندگان
چکیده
A duration modeling scheme and a speaking rate compensation technique are presented for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability also can be used to obtain the duration bounds for the bounded duration modeling. One of the advantages of proposed technique is that the cumulative duration probability can be applied directly to the Viterbi decoding procedure without additional postprocessing. Therefore, it rules the state and word transition at each frame. To alleviate the problems due to fast or slow speech, a modification to the bounded duration modeling which accounts for speaking rate is described. The experimental results on Korean connected digit recognition show the effectiveness of the proposed duration modeling scheme and the speaking rate compensation technique.
منابع مشابه
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners
In this paper we evaluate a method for generating synthetic speech at high speaking rates based on the interpolation of hidden semi-Markov models (HSMMs) trained on speech data recorded at normal and fast speaking rates. The subjective evaluation was carried out with both blind listeners, who are used to very fast speaking rates, and sighted listeners. We show that we can achieve a better intel...
متن کاملDuration modeling for HMM-based speech synthesis
This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we ...
متن کاملA Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition1
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the appli...
متن کاملSpeaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices
Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it cha...
متن کاملDeterminants of Unemployment Duration in Iran
This study investigates the effect of individual characteristics on the unemployment duration of job seekers in Iran, using information from the Labor Force Survey in 2018 and applying of nonparametric, semi-parametric and parametric methods. The results show that the probability of employment for married males and also Married females are more likely than single people. Regional differences in...
متن کامل